Scalable Strategies for Computing with Massive Data
نویسندگان
چکیده
منابع مشابه
Scalable Strategies for Computing with Massive Data
This paper presents two complementary statistical computing frameworks that address challenges in parallel processing and the analysis of massive data. First, the foreach package allows users of the R programming environment to define parallel loops that may be run sequentially on a single machine, in parallel on a symmetric multiprocessing (SMP) machine, or in cluster environments without plat...
متن کاملA scalable bootstrap for massive data
The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets—which are increasingly prevalent— the computation of bootstrap-based quantities can be prohibitively demanding computationally. While variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computation...
متن کاملEfficient Data Mining with Evolutionary Algorithms for Cloud Computing Application
With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...
متن کاملScalable Storage for Data-Intensive Computing
Cloud computing applications require a scalable, elastic and fault tolerant storage system. We survey how storage systems have evolved from the traditional distributed filesystems, peer-to-peer storage systems and how these ideas have been synthesized in current cloud computing storage systems. Then, we describe how metadata management can be improved for a file system built to support large sc...
متن کاملScalable Splitting of Massive Data Streams
Scalable execution of continuous queries over massive data streams often requires splitting input streams into parallel sub-streams over which query operators are executed in parallel. Automatic stream splitting is in general very difficult, as the optimal parallelization may depend on application semantics. To enable application specific stream splitting, we introduce splitstream functions whe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Statistical Software
سال: 2013
ISSN: 1548-7660
DOI: 10.18637/jss.v055.i14